Self-identification of protein-coding regions in microbial genomes.

نویسندگان

  • S Audic
  • J M Claverie
چکیده

A new method for predicting protein-coding regions in microbial genomic DNA sequences is presented. It uses an ab initio iterative Markov modeling procedure to automatically perform the partition of genomic sequences into three subsets shown to correspond to coding, coding on the opposite strand, and noncoding segments. In contrast to current methods, such as GENEMARK [Borodovsky, M. & McIninch, J. D. (1993) Comput. Chem. 17, 123-133], no training set or prior knowledge of the statistical properties of the studied genome are required. This new method tolerates error rates of 1-2% and can process unassembled sequences. It is thus ideal for the analysis of genome survey and/or fragmented sequence data from uncharacterized microorganisms. The method was validated on 10 complete bacterial genomes (from four major phylogenetic lineages). The results show that protein-coding regions can be identified with an accuracy of up to 90% with a totally automated and objective procedure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The in Silico Characterization of a Salicylic Acid Analogue Coding Gene Clusters in Selected Pseudomonas Fluorescens Strains

Background: The microbial genome sequences provide solid in silico framework for interpretation their drug-like chemical scaffolds biosynthetic potential. The Pseudomonas fluorescens species is metabolically versatile and producing therapeutically important natural products.Objectives: The main objective of the present study was to mine the publically available data of P. fluorescens stra...

متن کامل

Intergenics: A tool for extraction of intergenicregions

For the past one decade, there has been considerable explosion of interest in searching novel regulatory elements in the intergenic region between the protein coding regions. The microbial genomes are the most exploited in terms of intergenic (noncoding) regions due to its less complexity. We think, the increasing pace of genome sequencing calls for a tool which will be useful for the extractio...

متن کامل

On the convergence of a clustering algorithm for protein-coding regions in microbial genomes

MOTIVATION As the number of fully sequenced prokaryotic genomes continues to grow rapidly, computational methods for reliably detecting protein-coding regions become even more important. Audic and Claverie (1998) Proc. Natl Acad. Sci. USA, 95, 10026-10031, have proposed a clustering algorithm for protein-coding regions in microbial genomes. The algorithm is based on three Markov models of order...

متن کامل

Self-organizing Approach for Automated Gene Identification in Whole Genomes

An approach based on evolutionary consideration and very simple and clear idea of distinguished coding phase in explicit form for identification of protein-coding regions in whole genome has been proposed. For several genomes the optimal window length for averaging GC-content function and calculating codon frequencies has been found. It is shown that the structure of distribution of triplet fre...

متن کامل

Ab initio gene identification in metagenomic sequences

We describe an algorithm for gene identification in DNA sequences derived from shotgun sequencing of microbial communities. Accurate ab initio gene prediction in a short nucleotide sequence of anonymous origin is hampered by uncertainty in model parameters. While several machine learning approaches could be proposed to bypass this difficulty, one effective method is to estimate parameters from ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 95 17  شماره 

صفحات  -

تاریخ انتشار 1998